Selective Listening by Synchronizing Speech With Lips

نویسندگان

چکیده

A speaker extraction algorithm seeks to extract the speech of a target from multi-talker mixture when given cue that represents speaker, such as pre-enrolled utterance, or an accompanying video track. Visual cues are particularly useful is not available. In this work, we don’t rely on speaker’s speech, but rather use face track cue, referred auxiliary reference, form attractor towards speaker. We advocate temporal synchronization between and its lip movements direct dominant audio-visual cue. Therefore, propose self-supervised pre-training strategy, exploit speech-lip for extraction, which allows us leverage abundant unlabeled in-domain data. transfer knowledge pre-trained model encoder network. show proposed network outperforms various competitive baselines in terms signal quality, perceptual intelligibility, achieving state-of-the-art performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Brain activity during selective listening to natural speech.

Human brain functions involved in selective attention to particular sounds have been studied extensively with non-invasive measurements of electro-magnetic and hemodynamic brain activity. Here we review studies indicating that selection of the attended sounds for further processing occurs in the auditory cortex. The exact locus of this selection process in the auditory cortex appears to depend ...

متن کامل

Listening to and mimicking respiration: Understanding and synchronizing joint actions

The relation between action and respiration has received broad attention in the field of sport psychology since several psycho-physiological studies provided evidences about the mutual influences between respiration and performance management: breathing appears to be entrained to synchronous motor processes and to influence in return both rhythm and precision of simultaneous actions (Raßler, 20...

متن کامل

Testing Selective Transmission with Low Power Listening

Selective transmission policies allow nodes in a sensor network to autonomously decide between transmitting or discarding packets depending on the importance of the information content and the energetic cost of communications. The potential benefits of these policies depend on the capability of nodes to estimate its current energy consumption patterns. As a case study, this paper tests the perf...

متن کامل

Speech-Video Synchronization Using Lips Movements and Speech Envelope Correlation

In this paper, we propose a novel correlation based method for speech-video synchronization (synch) and relationship classification. The method uses the envelope of the speech signal and data extracted from the lips movement. Firstly, a nonlinear-time-varying model is considered to represent the speech signal as a sum of amplitude and frequency modulated (AM-FM) signals. Each AM-FM signal, in t...

متن کامل

the effects of speech rate,prosodic features, and blurred speech on iranian efl learners listening comprehension

کلید واژه ها به زبان انگلیسی: effect of speech rate on listening comprehension, blurred speech,segmental and suprasegmental features,authentic speech,intelligibility, discrimination, omission, assimilation چکیده: سرعت مطالب شنیداری در کلام پیوسته بطور کلی همواره کابوسی بوده برای یادگیرنده های زبان دوم و بالاخص برای شنوندگان ایرانی. علی رغم عقل سلیم که کلام با سرعت کندتری فعالیتهای درک مطلب شن...

15 صفحه اول

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2022.3153258